Background & Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged to every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

You as a Data scientist at Thera bank need to come up with a classification model that will help the bank improve their services so that customers do not renounce their credit cards

Objective

Data Dictionary:

Criteria

  1. Perform an Exploratory Data Analysis on the data
  1. Illustrate the insights based on EDA
  1. Data Pre-processing
  1. Model building - Logistic Regression
  1. Model building - Bagging and Boosting
  1. Hyperparameter tuning using grid search
  1. Hyperparameter tuning using random search
  1. Model Performances
  1. Actionable Insights & Recommendations
  1. Notebook - Overall quality

Key Packages

Initial Data Analysis

Findings:

Univariate EDA on numeric variables

Univariate EDA on categorical variables

Bivariate EDA

Findings from EDA:

Data Pre-processing

Splitting Dataset

MODEL BUILDING

Model evaluation criterion:

Model can make wrong predictions as:

  1. Predicting a customer will attrite - Loss of resources if he does not
  2. Predicting a customer will not attrite and stay as existing client - Loss of opportunity if he really does

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?

Logistic Regression

Oversampling train data using SMOTE

Logistic Regression on oversampled data

Regularization

Undersampling train data using SMOTE

Logistic Regression on undersampled data

Comparing Logistic Regression Models

Bagging and Boosting Models

Ada Boost GridSearchCV

Ada Boost RandomizedSearchCV

XGBoost GridSearchCV

XGBoost RandomizedSearchCV

Decision Tree GridSearchCV

Decision Tree RandomizedSearchCV

Comparing All Models

Business Recommendations